38 research outputs found
Learning to Learn to Disambiguate: Meta-Learning for Few-Shot Word Sense Disambiguation
The success of deep learning methods hinges on the availability of large
training datasets annotated for the task of interest. In contrast to human
intelligence, these methods lack versatility and struggle to learn and adapt
quickly to new tasks, where labeled data is scarce. Meta-learning aims to solve
this problem by training a model on a large number of few-shot tasks, with an
objective to learn new tasks quickly from a small number of examples. In this
paper, we propose a meta-learning framework for few-shot word sense
disambiguation (WSD), where the goal is to learn to disambiguate unseen words
from only a few labeled instances. Meta-learning approaches have so far been
typically tested in an -way, -shot classification setting where each task
has classes with examples per class. Owing to its nature, WSD deviates
from this controlled setup and requires the models to handle a large number of
highly unbalanced classes. We extend several popular meta-learning approaches
to this scenario, and analyze their strengths and weaknesses in this new
challenging setting.Comment: Added additional experiment
Neural Character-based Composition Models for Abuse Detection
The advent of social media in recent years has fed into some highly
undesirable phenomena such as proliferation of offensive language, hate speech,
sexist remarks, etc. on the Internet. In light of this, there have been several
efforts to automate the detection and moderation of such abusive content.
However, deliberate obfuscation of words by users to evade detection poses a
serious challenge to the effectiveness of these efforts. The current state of
the art approaches to abusive language detection, based on recurrent neural
networks, do not explicitly address this problem and resort to a generic OOV
(out of vocabulary) embedding for unseen words. However, in using a single
embedding for all unseen words we lose the ability to distinguish between
obfuscated and non-obfuscated or rare words. In this paper, we address this
problem by designing a model that can compose embeddings for unseen words. We
experimentally demonstrate that our approach significantly advances the current
state of the art in abuse detection on datasets from two different domains,
namely Twitter and Wikipedia talk page.Comment: In Proceedings of the EMNLP Workshop on Abusive Language Online 201
Joint Modelling of Emotion and Abusive Language Detection
The rise of online communication platforms has been accompanied by some
undesirable effects, such as the proliferation of aggressive and abusive
behaviour online. Aiming to tackle this problem, the natural language
processing (NLP) community has experimented with a range of techniques for
abuse detection. While achieving substantial success, these methods have so far
only focused on modelling the linguistic properties of the comments and the
online communities of users, disregarding the emotional state of the users and
how this might affect their language. The latter is, however, inextricably
linked to abusive behaviour. In this paper, we present the first joint model of
emotion and abusive language detection, experimenting in a multi-task learning
framework that allows one task to inform the other. Our results demonstrate
that incorporating affective features leads to significant improvements in
abuse detection performance across datasets.Comment: Proceedings of the 58th Annual Meeting of the Association for
Computational Linguistics, 202
Finding the Needle in a Haystack: Unsupervised Rationale Extraction from Long Text Classifiers
Long-sequence transformers are designed to improve the representation of
longer texts by language models and their performance on downstream
document-level tasks. However, not much is understood about the quality of
token-level predictions in long-form models. We investigate the performance of
such architectures in the context of document classification with unsupervised
rationale extraction. We find standard soft attention methods to perform
significantly worse when combined with the Longformer language model. We
propose a compositional soft attention architecture that applies RoBERTa
sentence-wise to extract plausible rationales at the token-level. We find this
method to significantly outperform Longformer-driven baselines on sentiment
classification datasets, while also exhibiting significantly lower runtimes
Scientific and Creative Analogies in Pretrained Language Models
This paper examines the encoding of analogy in large-scale pretrained
language models, such as BERT and GPT-2. Existing analogy datasets typically
focus on a limited set of analogical relations, with a high similarity of the
two domains between which the analogy holds. As a more realistic setup, we
introduce the Scientific and Creative Analogy dataset (SCAN), a novel analogy
dataset containing systematic mappings of multiple attributes and relational
structures across dissimilar domains. Using this dataset, we test the
analogical reasoning capabilities of several widely-used pretrained language
models (LMs). We find that state-of-the-art LMs achieve low performance on
these complex analogy tasks, highlighting the challenges still posed by analogy
understanding.Comment: To be published in Findings of EMNLP 202
A Simple and Robust Approach to Detecting Subject-Verb Agreement Errors
While rule-based detection of subject-verb agreement (SVA) errors is sensitive to syntactic parsing errors and irregularities and exceptions to the main rules, neural sequential labelers have a tendency to overfit their training data. We observe that rule-based error generation is less sensitive to syntactic parsing errors and irregularities than error detection and explore a simple, yet efficient approach to getting the best of both worlds: We train neural sequential labelers on the combination of large volumes of silver standard data, obtained through rule-based error generation, and gold standard data. We show that our simple protocol leads to more robust detection of SVA errors on both in-domain and out-of-domain data, as well as in the context of other errors and long-distance dependencies; and across four standard benchmarks, the induced model on average achieves a new state of the art